backbone model
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Virginia > Arlington County (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Asia > China > Sichuan Province > Chengdu (0.05)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Asia (0.04)
Dialect Identification Using Resource-Efficient Fine-Tuning Approaches
Lin, Zirui, Gulzar, Haris, Busto, Monnika Roslianna, Masaki, Akiko, Eda, Takeharu, Nakadai, Kazuhiro
Dialect Identification (DI) is a task to recognize different dialects within the same language from a speech signal. DI can help to improve the downstream speech related tasks even when speakers have a strong dialect. However, fine-tuning a speech model for tasks like DI is expensive in terms of computation cost and memory requirement. Recent studies have explored fine-tuning pre-trained speech models for tasks like DI using Parameter-Efficient Fine-Tuning (PEFT) methods, which offer parameter efficiency but limited improvement in memory efficiency and training speed. To address these challenges, we explore Memory-Efficient Fine-Tuning (MEFT) methods, originally proposed for language processing, and apply them to the general-purpose pre-trained speech model. We then comprehensively analyze the GPU memory usage and fine-tuning speed based on various MEFT methods. As a case study, we fine-tune the Whisper model to identify six Mandarin subdialects from the KeSpeech dataset, reducing GPU memory usage by up to 73.25% and accelerating training speed by a factor of 2.1, while maintaining accuracy comparable to vanilla fine-tuning and PEFT methods.
Mitigating Gender Bias in Depression Detection via Counterfactual Inference
Hu, Mingxuan, Ma, Hongbo, Wu, Xinlan, Liu, Ziqi, Liu, Jiaqi, Chen, Yangbin
Audio-based depression detection models have demonstrated promising performance but often suffer from gender bias due to imbalanced training data. Epidemiological statistics show a higher prevalence of depression in females, leading models to learn spurious correlations between gender and depression. Consequently, models tend to over-diagnose female patients while underperforming on male patients, raising significant fairness concerns. To address this, we propose a novel Counterfactual Debiasing Framework grounded in causal inference. We construct a causal graph to model the decision-making process and identify gender bias as the direct causal effect of gender on the prediction. During inference, we employ counterfactual inference to estimate and subtract this direct effect, ensuring the model relies primarily on authentic acoustic pathological features. Extensive experiments on the DAIC-WOZ dataset using two advanced acoustic backbones demonstrate that our framework not only significantly reduces gender bias but also improves overall detection performance compared to existing debiasing strategies.
- Asia > China > Shaanxi Province > Xi'an (0.05)
- Europe > Iceland > Capital Region > Reykjavik (0.04)
RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting
Wang, Jieting, Shang, Xiaolei, Li, Feijiang, Peng, Furong
Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.
- North America > United States (0.14)
- Asia > China > Shanxi Province (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.48)